Overview

Dataset statistics

Number of variables15
Number of observations5680
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory665.8 KiB
Average record size in memory120.0 B

Variable types

NUM12
CAT3

Warnings

qty_items is highly correlated with gross_revenueHigh correlation
gross_revenue is highly correlated with qty_itemsHigh correlation
gross_revenue is highly skewed (γ1 = 23.11421912) Skewed
qty_items is highly skewed (γ1 = 25.14834388) Skewed
avg_ticket is highly skewed (γ1 = 48.348493) Skewed
qty_returns is highly skewed (γ1 = 29.65764799) Skewed
customer_id has unique values Unique
qty_returns has 4192 (73.8%) zeros Zeros
returns_ratio has 4192 (73.8%) zeros Zeros

Reproduction

Analysis started2022-09-24 11:37:14.133470
Analysis finished2022-09-24 11:37:34.676444
Duration20.54 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct5680
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16605.09894
Minimum12347
Maximum22709
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:34.768126image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12702.95
Q114291.75
median16231
Q318212.25
95-th percentile21743.2
Maximum22709
Range10362
Interquartile range (IQR)3920.5

Descriptive statistics

Standard deviation2808.520003
Coefficient of variation (CV)0.1691359993
Kurtosis-0.8231342656
Mean16605.09894
Median Absolute Deviation (MAD)1960
Skewness0.4403964428
Sum94316962
Variance7887784.608
MonotocityNot monotonic
2022-09-24T08:37:34.916057image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
143331< 0.1%
 
170811< 0.1%
 
191601< 0.1%
 
150661< 0.1%
 
171171< 0.1%
 
150701< 0.1%
 
130231< 0.1%
 
150741< 0.1%
 
130271< 0.1%
 
171251< 0.1%
 
Other values (5670)567099.8%
 
ValueCountFrequency (%) 
123471< 0.1%
 
123481< 0.1%
 
123491< 0.1%
 
123501< 0.1%
 
123521< 0.1%
 
ValueCountFrequency (%) 
227091< 0.1%
 
227081< 0.1%
 
227071< 0.1%
 
227061< 0.1%
 
227051< 0.1%
 

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5423
Distinct (%)95.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1760.008926
Minimum0.42
Maximum279138.02
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:35.108931image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile13.3735
Q1237.5475
median615.265
Q31571.07
95-th percentile5307.991
Maximum279138.02
Range279137.6
Interquartile range (IQR)1333.5225

Descriptive statistics

Standard deviation7508.748872
Coefficient of variation (CV)4.266312949
Kurtosis702.6362071
Mean1760.008926
Median Absolute Deviation (MAD)480.515
Skewness23.11421912
Sum9996850.7
Variance56381309.62
MonotocityNot monotonic
2022-09-24T08:37:35.237873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7.9590.2%
 
2.9580.1%
 
1.2580.1%
 
4.9580.1%
 
1.6570.1%
 
12.7570.1%
 
3.7570.1%
 
4.2560.1%
 
7.560.1%
 
5.9560.1%
 
Other values (5413)560898.7%
 
ValueCountFrequency (%) 
0.421< 0.1%
 
0.651< 0.1%
 
0.791< 0.1%
 
0.8440.1%
 
0.8530.1%
 
ValueCountFrequency (%) 
279138.021< 0.1%
 
259657.31< 0.1%
 
194550.791< 0.1%
 
140450.721< 0.1%
 
124564.531< 0.1%
 

recency_days
Real number (ℝ≥0)

Distinct304
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.8264085
Minimum0
Maximum373
Zeros37
Zeros (%)0.7%
Memory size44.4 KiB
2022-09-24T08:37:35.372789image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q122
median71
Q3199.25
95-th percentile338
Maximum373
Range373
Interquartile range (IQR)177.25

Descriptive statistics

Standard deviation111.6124711
Coefficient of variation (CV)0.9553702158
Kurtosis-0.640424192
Mean116.8264085
Median Absolute Deviation (MAD)61
Skewness0.8152565497
Sum663574
Variance12457.34369
MonotocityNot monotonic
2022-09-24T08:37:35.498724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11101.9%
 
41051.8%
 
3981.7%
 
2921.6%
 
10861.5%
 
8821.4%
 
9791.4%
 
17791.4%
 
7771.4%
 
15661.2%
 
Other values (294)480684.6%
 
ValueCountFrequency (%) 
0370.7%
 
11101.9%
 
2921.6%
 
3981.7%
 
41051.8%
 
ValueCountFrequency (%) 
373230.4%
 
372220.4%
 
371170.3%
 
36940.1%
 
368130.2%
 

qty_invoices
Real number (ℝ≥0)

Distinct56
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.477112676
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:35.635629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11
Maximum206
Range205
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.821279289
Coefficient of variation (CV)1.961765386
Kurtosis301.4225171
Mean3.477112676
Median Absolute Deviation (MAD)0
Skewness13.17878586
Sum19750
Variance46.52985114
MonotocityNot monotonic
2022-09-24T08:37:35.776548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1285950.3%
 
282314.5%
 
35028.8%
 
43946.9%
 
52374.2%
 
61733.0%
 
71382.4%
 
8981.7%
 
9691.2%
 
10551.0%
 
Other values (46)3325.8%
 
ValueCountFrequency (%) 
1285950.3%
 
282314.5%
 
35028.8%
 
43946.9%
 
52374.2%
 
ValueCountFrequency (%) 
2061< 0.1%
 
1991< 0.1%
 
1241< 0.1%
 
971< 0.1%
 
912< 0.1%
 

qty_items
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1837
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean951.6228873
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:35.908472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.95
Q1106
median317.5
Q3805.25
95-th percentile2927.8
Maximum196844
Range196843
Interquartile range (IQR)699.25

Descriptive statistics

Standard deviation4189.784022
Coefficient of variation (CV)4.402777694
Kurtosis944.7464797
Mean951.6228873
Median Absolute Deviation (MAD)253.5
Skewness25.14834388
Sum5405218
Variance17554290.15
MonotocityNot monotonic
2022-09-24T08:37:36.040396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11132.0%
 
2711.2%
 
3510.9%
 
4490.9%
 
5350.6%
 
6290.5%
 
12240.4%
 
88210.4%
 
72210.4%
 
7200.4%
 
Other values (1827)524692.4%
 
ValueCountFrequency (%) 
11132.0%
 
2711.2%
 
3510.9%
 
4490.9%
 
5350.6%
 
ValueCountFrequency (%) 
1968441< 0.1%
 
802631< 0.1%
 
773731< 0.1%
 
699931< 0.1%
 
645491< 0.1%
 

qty_products
Real number (ℝ≥0)

Distinct530
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.81954225
Minimum1
Maximum7838
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:36.179333image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q114
median41
Q3107
95-th percentile333
Maximum7838
Range7837
Interquartile range (IQR)93

Descriptive statistics

Standard deviation210.8178433
Coefficient of variation (CV)2.271265708
Kurtosis509.2485706
Mean92.81954225
Median Absolute Deviation (MAD)33
Skewness17.73699118
Sum527215
Variance44444.16306
MonotocityNot monotonic
2022-09-24T08:37:36.308276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
12534.5%
 
21482.6%
 
31071.9%
 
10991.7%
 
6981.7%
 
9921.6%
 
5911.6%
 
4871.5%
 
7821.4%
 
8811.4%
 
Other values (520)454280.0%
 
ValueCountFrequency (%) 
12534.5%
 
21482.6%
 
31071.9%
 
4871.5%
 
5911.6%
 
ValueCountFrequency (%) 
78381< 0.1%
 
56731< 0.1%
 
50951< 0.1%
 
45801< 0.1%
 
26981< 0.1%
 

avg_ticket
Real number (ℝ≥0)

SKEWED

Distinct5477
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.15775098
Minimum0.42
Maximum13305.5
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:36.449209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile3.460929841
Q17.948445596
median15.84992017
Q321.94922722
95-th percentile75.94814286
Maximum13305.5
Range13305.08
Interquartile range (IQR)14.00078163

Descriptive statistics

Standard deviation210.2054605
Coefficient of variation (CV)6.746490165
Kurtosis2868.43849
Mean31.15775098
Median Absolute Deviation (MAD)7.486119959
Skewness48.348493
Sum176976.0255
Variance44186.33564
MonotocityNot monotonic
2022-09-24T08:37:36.600116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
3.75110.2%
 
4.95100.2%
 
1.2590.2%
 
2.9590.2%
 
7.9580.1%
 
12.7570.1%
 
8.2570.1%
 
1.6570.1%
 
5.9560.1%
 
3.3560.1%
 
Other values (5467)560098.6%
 
ValueCountFrequency (%) 
0.4230.1%
 
0.5351< 0.1%
 
0.651< 0.1%
 
0.791< 0.1%
 
0.83714285711< 0.1%
 
ValueCountFrequency (%) 
13305.51< 0.1%
 
4307.181< 0.1%
 
38611< 0.1%
 
3202.921< 0.1%
 
30961< 0.1%
 

frequency
Real number (ℝ≥0)

Distinct1226
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5466086757
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:36.742046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01102941176
Q10.02491103203
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.975088968

Descriptive statistics

Standard deviation0.5504660073
Coefficient of variation (CV)1.007056843
Kurtosis139.3198535
Mean0.5466086757
Median Absolute Deviation (MAD)0
Skewness4.868938423
Sum3104.737278
Variance0.3030128252
MonotocityNot monotonic
2022-09-24T08:37:36.901962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1286750.5%
 
2470.8%
 
0.0625170.3%
 
0.02777777778170.3%
 
0.02380952381160.3%
 
0.09090909091150.3%
 
0.08333333333150.3%
 
0.03448275862140.2%
 
0.02941176471140.2%
 
0.07692307692130.2%
 
Other values (1216)264546.6%
 
ValueCountFrequency (%) 
0.0054495912811< 0.1%
 
0.0054644808741< 0.1%
 
0.0054794520551< 0.1%
 
0.0054945054951< 0.1%
 
0.0055865921792< 0.1%
 
ValueCountFrequency (%) 
171< 0.1%
 
41< 0.1%
 
350.1%
 
2470.8%
 
1.1428571431< 0.1%
 

qty_returns
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct204
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.8584507
Minimum0
Maximum8004
Zeros4192
Zeros (%)73.8%
Memory size44.4 KiB
2022-09-24T08:37:37.046891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile37
Maximum8004
Range8004
Interquartile range (IQR)1

Descriptive statistics

Standard deviation164.9904426
Coefficient of variation (CV)10.4039446
Kurtosis1165.508068
Mean15.8584507
Median Absolute Deviation (MAD)0
Skewness29.65764799
Sum90076
Variance27221.84613
MonotocityNot monotonic
2022-09-24T08:37:37.185834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0419273.8%
 
11693.0%
 
21482.6%
 
31051.8%
 
4891.6%
 
6781.4%
 
5611.1%
 
12510.9%
 
7440.8%
 
8430.8%
 
Other values (194)70012.3%
 
ValueCountFrequency (%) 
0419273.8%
 
11693.0%
 
21482.6%
 
31051.8%
 
4891.6%
 
ValueCountFrequency (%) 
80041< 0.1%
 
44271< 0.1%
 
37681< 0.1%
 
33321< 0.1%
 
28781< 0.1%
 

returns_ratio
Real number (ℝ≥0)

ZEROS

Distinct1377
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0101194619
Minimum0
Maximum0.9863013699
Zeros4192
Zeros (%)73.8%
Memory size44.4 KiB
2022-09-24T08:37:37.416967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.000971722138
95-th percentile0.0388574896
Maximum0.9863013699
Range0.9863013699
Interquartile range (IQR)0.000971722138

Descriptive statistics

Standard deviation0.04805282652
Coefficient of variation (CV)4.748555507
Kurtosis106.5617978
Mean0.0101194619
Median Absolute Deviation (MAD)0
Skewness9.135895751
Sum57.47854359
Variance0.002309074137
MonotocityNot monotonic
2022-09-24T08:37:37.571522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0419273.8%
 
0.00966183574940.1%
 
0.0107526881740.1%
 
0.0384615384630.1%
 
0.0238095238130.1%
 
0.00246305418730.1%
 
0.0322580645230.1%
 
0.0121951219530.1%
 
0.00925925925930.1%
 
0.0149253731330.1%
 
Other values (1367)145925.7%
 
ValueCountFrequency (%) 
0419273.8%
 
0.00011696362431< 0.1%
 
0.00018399264031< 0.1%
 
0.00028169014081< 0.1%
 
0.00031407035181< 0.1%
 
ValueCountFrequency (%) 
0.98630136991< 0.1%
 
0.83333333331< 0.1%
 
0.63333333331< 0.1%
 
0.61151079141< 0.1%
 
0.60088365241< 0.1%
 

avg_basket_size
Real number (ℝ≥0)

Distinct2365
Distinct (%)41.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean248.6226055
Minimum1
Maximum14149
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:37.717438image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q175
median151.8333333
Q3290.6354167
95-th percentile732
Maximum14149
Range14148
Interquartile range (IQR)215.6354167

Descriptive statistics

Standard deviation448.2867659
Coefficient of variation (CV)1.803081281
Kurtosis371.3144505
Mean248.6226055
Median Absolute Deviation (MAD)96.33333333
Skewness14.70230138
Sum1412176.399
Variance200961.0245
MonotocityNot monotonic
2022-09-24T08:37:38.226159image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11142.0%
 
2701.2%
 
3510.9%
 
4490.9%
 
5350.6%
 
6290.5%
 
12250.4%
 
100220.4%
 
72220.4%
 
73210.4%
 
Other values (2355)524292.3%
 
ValueCountFrequency (%) 
11142.0%
 
2701.2%
 
3510.9%
 
3.3333333331< 0.1%
 
4490.9%
 
ValueCountFrequency (%) 
141491< 0.1%
 
139561< 0.1%
 
90141< 0.1%
 
78241< 0.1%
 
59631< 0.1%
 

avg_unique_basket_size
Real number (ℝ≥0)

Distinct1174
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.32401733
Minimum0.2
Maximum1109
Zeros0
Zeros (%)0.0%
Memory size44.4 KiB
2022-09-24T08:37:38.392063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.2
5-th percentile1
Q17.285714286
median15.14646465
Q331
95-th percentile173.05
Maximum1109
Range1108.8
Interquartile range (IQR)23.71428571

Descriptive statistics

Standard deviation76.97568488
Coefficient of variation (CV)2.062363335
Kurtosis32.80289579
Mean37.32401733
Median Absolute Deviation (MAD)9.853535354
Skewness5.067619586
Sum212000.4185
Variance5925.256062
MonotocityNot monotonic
2022-09-24T08:37:38.527998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
12754.8%
 
21592.8%
 
31142.0%
 
91051.8%
 
101041.8%
 
81031.8%
 
61011.8%
 
51011.8%
 
71001.8%
 
13971.7%
 
Other values (1164)442177.8%
 
ValueCountFrequency (%) 
0.21< 0.1%
 
0.2530.1%
 
0.333333333370.1%
 
0.41< 0.1%
 
0.40909090911< 0.1%
 
ValueCountFrequency (%) 
11091< 0.1%
 
7481< 0.1%
 
7301< 0.1%
 
7201< 0.1%
 
7031< 0.1%
 
Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
before_11
2022 
between_11_20
1942 
after_20
1716 
ValueCountFrequency (%) 
before_11202235.6%
 
between_11_20194234.2%
 
after_20171630.2%
 
2022-09-24T08:37:38.653953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-09-24T08:37:38.722904image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:38.805879image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length13
Median length9
Mean length10.06549296
Min length8
Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
4
2468 
3
1157 
2
1090 
1
965 
ValueCountFrequency (%) 
4246843.5%
 
3115720.4%
 
2109019.2%
 
196517.0%
 
2022-09-24T08:37:38.913826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2022-09-24T08:37:38.991803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:39.069751image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

country
Categorical

Distinct36
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size44.4 KiB
United Kingdom
5226 
Germany
 
94
France
 
90
EIRE
 
29
Spain
 
27
Other values (31)
 
214
ValueCountFrequency (%) 
United Kingdom522692.0%
 
Germany941.7%
 
France901.6%
 
EIRE290.5%
 
Spain270.5%
 
Belgium240.4%
 
Switzerland230.4%
 
Portugal200.4%
 
Italy140.2%
 
Finland120.2%
 
Other values (26)1212.1%
 
2022-09-24T08:37:39.180690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique8 ?
Unique (%)0.1%
2022-09-24T08:37:39.300635image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length20
Median length14
Mean length13.43292254
Min length3

Interactions

2022-09-24T08:37:15.364013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:15.490940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:15.603876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:15.723808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:15.839756image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.084604image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.208532image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.347453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.465402image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.604306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.752221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:16.877150image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.004076image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.142007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.257940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.373880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.480813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.601749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.720675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.835610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:17.950543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.069481image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.182401image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.309345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.422279image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.540205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.656139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.780058image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:18.891004image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.017922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.140861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.259784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.380714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.506653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.622585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.750502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.868434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:19.977372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:20.756934image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:20.867871image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:20.966820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.082753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.194674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.304626image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.412565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.525494image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.628425image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.740371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.846316image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:21.968240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.092159image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.222085image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.340027image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.470952image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.601877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.726805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.856731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:22.991659image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.117580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.245508image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.369437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.491367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.615297image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.749225image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.865143image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:23.998067image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.127993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.253929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.377861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.505786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.626716image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.758647image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.880570image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:24.993496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.107412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.225350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.333272image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.456218image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.583139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.699081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:25.999841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.129767image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.244968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.370926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.485032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.602030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.725993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.851860image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:26.965203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.090179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.215108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.335039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.456450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.578389image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.694330image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.818186image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:27.936092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.057123image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.178113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.346007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.467947image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.597765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.725713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.850642image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:28.978588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.109531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.228457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.360372image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.484301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.596236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.711170image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.826114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:29.931054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.048994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.167918image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.281852image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.395790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.514720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.624657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.739592image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.850517image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:30.968600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.088531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.212466image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.326971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.452899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.578836image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.700756image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.828692image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:31.959609image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:32.076541image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:32.476930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:32.598859image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:32.714793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:32.827729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:32.942662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.093576image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.212508image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.331449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.441463image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.558559image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.726453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.837399image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:33.955363image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-09-24T08:37:39.411583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-24T08:37:39.622485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-24T08:37:39.834903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-24T08:37:40.051794image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-09-24T08:37:40.249535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-09-24T08:37:34.217835image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-09-24T08:37:34.537534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

customer_idgross_revenuerecency_daysqty_invoicesqty_itemsqty_productsavg_ticketfrequencyqty_returnsreturns_ratioavg_basket_sizeavg_unique_basket_sizeperiod_of_buy_dayperiod_of_buy_quartercountry
0178505391.21372.034.01733.0297.018.15222217.00000040.00.02308150.9705880.617647after_204United Kingdom
1130473232.5956.09.01390.0171.018.9040350.02830235.00.025180154.44444411.666667after_204United Kingdom
2125836705.382.015.05028.0232.028.9025000.04032350.00.009944335.2000007.600000before_114France
313748948.2595.05.0439.028.033.8660710.0179210.00.00000087.8000004.800000before_113United Kingdom
415100876.00333.03.080.03.0292.0000000.07317122.00.27500026.6666670.333333before_114United Kingdom
5152914623.3025.014.02102.0102.045.3264710.04011529.00.013796150.1428574.357143before_111United Kingdom
6146885630.877.021.03621.0327.017.2197860.057221399.00.110191172.4285717.047619after_201United Kingdom
7178095411.9116.012.02057.061.088.7198360.03352041.00.019932171.4166673.833333before_114United Kingdom
81531160767.900.091.038194.02379.025.5434640.243316474.00.012410419.7142866.230769before_114United Kingdom
9160982005.6387.07.0613.067.029.9347760.0243900.00.00000087.5714294.857143between_11_203United Kingdom

Last rows

customer_idgross_revenuerecency_daysqty_invoicesqty_itemsqty_productsavg_ticketfrequencyqty_returnsreturns_ratioavg_basket_sizeavg_unique_basket_sizeperiod_of_buy_dayperiod_of_buy_quartercountry
5670227004839.421.01.01074.062.078.0551611.00.00.01074.055.0before_114United Kingdom
567113298360.001.01.096.02.0180.0000001.00.00.096.02.0before_114United Kingdom
567214569227.391.01.079.012.018.9491671.00.00.079.010.0before_114United Kingdom
56732270417.901.01.014.07.02.5571431.00.00.014.07.0before_114United Kingdom
5674227053.351.01.02.02.01.6750001.00.00.02.02.0before_114United Kingdom
5675227065699.001.01.01747.0634.08.9889591.00.00.01747.0634.0before_114United Kingdom
5676227076756.060.01.02010.0730.09.2548771.00.00.02010.0730.0before_114United Kingdom
5677227083217.200.01.0654.059.054.5288141.00.00.0654.056.0before_114United Kingdom
5678227093950.720.01.0731.0217.018.2060831.00.00.0731.0217.0before_114United Kingdom
567912713794.550.01.0505.037.021.4743241.00.00.0505.037.0before_114Germany